0 



3 - Si8" ob 



0 THE ASSISTANT COMMISSIONER OF PATENTS 
co Washington, D.C. 20231 

9 Sir: 



DOCKET NUMBER: AUS990891US1 
March 27, 2000 



Transmitted herewith for filing is the Patent Application of: 



° Inventor(s): Rabindranath Dutta, and Kamal Chandrakant Patel 

For: DETECTING COPYRIGHT VIOLATION VIA STREAMED EXTRACTION AND SIGNATURE ANALYSIS IN A 
METHOD, SYSTEM AND PROGRAM 



oOO 



Enclosed are: 

X Patent Specification and Declaration 
X 5_ sheets of drawing(s). 

X An assignment of the invention to International Business Machines Corporation (includes Recordation Form Cover Sheet)%^^^^ 
A certified copy of a application. 

Information Disclosure Statement, PTO 1449 and copies of references. 



The filing fee has 


been calculated 


as shown 


below: 








For 


Number 
Filed 


Number 
Extra 


Rate 


Fee 




Basic Fee 








$ 


690 


00 


Total Claims 


23 - 20 


3 


x 18 = 


$ 


54 


00 


Indep. Claims 


4-3 


1 


x 78 = 


$ 


78 


00 


MULTIPLE DEPENDENT 


CLAIM PRESENTED 




x 260 - 


$ 












TOTAL 


$ 


822 


00 



X Please charge IBM Corporation Deposit Account No. 09-0447 in the amount of $ 822.00 . A duplicate copy of this sheet is 
enclosed. 

X The Commissioner is hereby authorized to charge payment of the following fees associated with this communication or 
credit any overpayment to IBM Corporation Deposit Account 09-0447 . A duplicate copy of this sheet is enclosed. 

X Any additional filing fees required under 37 CFR §1.16. 

X Any patent application processing fees under 37 CFR §1.17. 



CERTIFICATE OF MAILING BY "EXPRESS MAIL" UNDER 37 CFR § 1.10 

"Express Mail" mailing label number: EL139165860US 
Date of Mailing: March 27, 2000 

I hereby certify that the documents indicated above are being deposited with the United States 
Postal Service under 37 CFR 1.10 on the date indicated above and are addressed to Box Patent 
Applications, Assistant Commissioner for Patents, Washington, D.C. 20231 and mailed on the 
above Date of Mailing with the above "Express Mail" mailing label number. 




Respectfully submitted, 




Sidney LvWeatherford 
Registration No. P45,602 for 
FELSMAN, BRADLEY, VADEN, 

GUNTER & DILLON, LLP 
201 Main Street, Suite 1600 
Fort Worth, Texas 76102 
Telephone (817) 332-8143 



Fort Worth/0116AD-37542/79028.1 



DETECTING COPYRIGHT VIOLATION VIA STREAMED EXTRACTION AND 
SIGNATURE ANALYSIS IN A METHOD, SYSTEM AND PROGRAM 
BACKGROUND OF THE INVENTION 

1. Technical Field: 

The present invention generally relates to copyright 
material and in particular to discovering copyright 
infringement on a network, including the Internet. Still 
more particularly, the present invention relates to 
discovering copyright infringement without infringing on 
copyright material . 

2. Description of the Related Art: 

Copyright infringement is a major problem on the 
Internet (Web) . Digital documents like Web pages, MP 3 
audio, etc. are very easy to copy and put on a Web site. 
Since the Web has documents on line that are reaching 
billions of pages it is extremely hard for a publisher to 
track down sites which have infringed on an author's 
copyright by posting copies of the author's original work. 
An article entitled "Extent of copyright infringement on the 
Web" in the September 14, 1999 issue of Fortune Investor 
News , details the extent of copyright violations on the Web. 
"There are more than 2 million web sites offering, linking 
or referencing "warez," the Internet code word for illegal 
copies of software. This problem has increased 
significantly over the past three years, from roughly 
100,000 warez sites two years ago, to 900,000 last year." 

Generally, in the past, utilizing a search engine 
service to detect copyright infringement would suffice. 
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Keywords would be entered into a search engine, which 
indexes a large portion of the Web, to determine candidate 
pages to search for copyright infringement. Typically 
hundreds, if not thousands of hits would be returned to the 
search engine based on the search criteria using keywords. 

The candidate pages were then downloaded to the author 
or publisher's computer. The searcher would then perform 
more computer aided processing on the candidate pages to 
determine potential infringers. If there were just a few 
pages, reading the downloaded files would be the next step 
to determine if there were any infringement. However, there 
would be many files to inspect and this would require a 
further search that involved more complex pattern matching. 
- This step would narrow the choices further so that visual 
inspection of the files could be made to see if a copyright 
was being violated. 

Unfortunately, the passage of The Digital Millennium 
Copyright Act, signed into law on October 28, 1998. has made 
the approach as described above untenable. The digital age 
has prompted the passage of strict laws on copyright 
protection by the United States Congress. A strict 
interpretation of the law would prevent anyone but exempted 
entities, from storing copies of copyrighted Web documents 
on their computer except for downloading incidental to 
viewing (caching and immediate viewing) . While the law is 
complex the only clear exemptions are: Internet Service 
Providers (ISPs) ; search engines - as long as they do not 
profit directly; non-profit educational institutions and 
system caching. 

Generally a publisher is not concerned about the 
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copying of a line or two of text or a few bars of music 
because that is not a violation of the "fair-use" act for 
copying. What he is most concerned about is the copying of 
entire paragraphs or sections of music verbatim. Even if 
data could be downloaded "legally" to disk, typical pattern 
matching algorithms take an inordinate amount of time when 
the matching strings that are very long (e.g., a text 
paragraph) . 

Due to the billions of Web pages on the Internet, and 
The Digital Millennium Copyright Act, the process of 
detecting unauthorized posting or copyright infringement on 
the Web becomes nearly impossible. Therefore it would be 
desirable to provide a process that would enable an author 
or publisher to do a reasonably thorough search of the 
Internet for copyright infringers without violating The 
Digital Millennium Copyright Act. Further, it would be 
desirable to detect Web pages that have copied or modified 
copyright digital data on the Internet, without extracting 
and storing pages to store and further process . 
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SUMMARY OF THE INVENTION 



It is therefore one object of the present invention to 
search the Internet for copyright infringing material. 

5 

It is another object of the present invention to review 
the bulk of potential infringing material automatically. 

It is yet another object of the present invention to 
10 review the bulk of potential infringing material without 

violating current digital copyright laws. 



The foregoing objects are achieved as is now described. 
j : S Signature analysis of streamed download data narrows a list 

IS of likely candidates after a search engine search has been 

^li performed. "Streamed download" means that the data is 

m brought byte by byte to the searcher's computer and then 

^ discarded. During the streaming process the subject data, 

□ in binary form, is parsed into segments and each segment is 

k'h provided with a signature based on the content of the 

LI segment. The signatures of the individual segments are then 

p compared to signatures of the material that may have been 

^ infringed. If there are no matches, the streamed download 

data is discarded. If there are enough matches of the 
25 signatures, the offending data is displayed for visual 

inspection. Only caching and display takes place if a 
potential infringing segment is discovered. 

The above as well as additional objectives, features, 
3 0 and advantages of the present invention will become apparent 

in the following detailed written description. 
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BRIEF DESCRIPTION OF THE DRAWINGS 



10 



w 



25 



30 



The novel features believed characteristic of the 
invention are set forth in the appended claims. The 
invention itself however, as well as a preferred mode of 
use, further objects and advantages thereof, will best be 
understood by reference to the following detailed 
description of an illustrative embodiment when read in 
conjunction with the accompanying drawings, wherein: 



Figure 1 depicts a high-level block diagram of a system 
for detecting copyright violation on the Internet in which a 
C3 preferred embodiment of the present invention may be 

implemented; 



Figure 2 is a high-level block diagram of a system for 
processing and providing electronic signatures for original 
copyright material in accordance with a preferred embodiment 
of the present invention; 

Figure 3A depicts a high-level block diagram of a 
system for detecting copyright infringing material in which 
a preferred embodiment of the present invention may be 
implemented; 

Figure 3B is a high-level block diagram for providing 
electronic signatures for potentially infringing material 
for comparison to original copyright material in accordance 
with a preferred embodiment of the present invention; 

Figure 4 depicts a method for detecting material on the 
Internet that infringes copyrights in accordance with a 
preferred embodiment of the present invention; 
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Figure 5 illustrates a device for providing signature 
analysis in accordance with a preferred embodiment of the 
present invention; and 

Figure 6 depicts a method of generating a signature for 
a data segment in accordance with a preferred embodiment of 
the present invention. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT 

With reference now to the figures, and in particular 
with reference to Figure 1, a high-level block diagram of a 
5 system for detecting copyright violation on the Internet in 

which a preferred embodiment of the present invention may be 
implemented, is depicted. System 100 comprises server (or 
computer) 102 containing copyright material 104, search 
engine 106 (whose program may be resident on a server 

10 connected to the web or computer 102), Internet 108, 
multiple web pages 110, 112, and 114 and comparator 116. 

O In order for an author or publisher to detect copyright 

in violators on the Internet the copyright material must be 

11 available for comparison. In the present invention 

J : pJ copyright material 104 is available on server 102. The 

|;3 author provides keywords, critical to copyright material 

^ 104, to search engine 106 in a query to search for potential 

O infringers. Search engine 106, which may be located on 

server 102 or Internet 108, then searches its own database 
of Universal Resource Locator (URL) addresses and Internet 
O 108, for web pages that contain the keywords. 

Web page 1 110, Web page 2 112 through Web page n 114 
25 are returned in the form of URLs to server 102. Signatures 

are generated from Web documents pointed to by each URL . 
These signatures are compared with known signatures of 
copyright material 104 through a comparator 116 to determine 
if there are any potential matches based on the occurrences 
30 of keywords. Web page comparator 116 to determine if there 

are any potential matches based on the number of keyword 
occurrences . 
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Referring to Figure 2, a high-level block diagram of a 
system for processing and providing electronic signatures 
for original copyright material in accordance with a 
preferred embodiment of the present invention is 
5 illustrated. Original copyright material 200 is parsed into 

data segments and converted to binary form. The converted 
data segments are then processed through signature generator 
202 and corresponding electronic signatures 204 are 
generated. The signatures are stored to be utilized later 
10 for comparison with electronic signatures of potential 

infringing material . 

P Referring now to Figure 3A, a high-level block diagram 

of a system for detecting copyright infringing material in 
%}$ which a preferred embodiment of the present invention may be 

implemented is depicted. Computer 300 (a computer 
rg associated with the author or publisher of copyrighted 

material) is connected to Internet 302. Search engine 304 
□ may be resident on computer 300 or a separate server that is 

M) also connected to Internet 302. Either search engine may be 

I* utilized to perform a keyword search for possible infringing 

O copyright material on the Internet. If potential infringing 

liS * material is found on server 306, search engine 304 returns 

the URL of infringing server 306 and any other URL upon 
25 which potential infringing material is found. The material 

is parsed and provided with electronic signatures for 

comparison to the copyright material present on computer 

300. 

3 0 Referring to Figure 3B, a high-level block diagram for 

providing electronic signatures for potentially infringing 
material in accordance with a preferred embodiment of the 
present invention, is depicted. Internet Web page 310, in 
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this instance, has been determined to have potential 
infringing material on the web site. The URL of Internet 
web page 310 is returned to the site requesting the search. 
All URLs are then searched automatically. Signatures 312 
5 for each data segment of the potential offending material 

from each URL are generated. 

Signatures 312 are sent to comparator 314 as are 
signatures 316 of the original copyrighted material. 
10 Comparisons are made of the signatures, not the material 

itself, of both original and potential infringing material 
and any matches are returned to the searcher. 

j'j* 3 Referring now to Figure 4, a method for detecting 

i)B material on the Internet that infringes copyrights in 

^ accordance with a preferred embodiment of the present 

|;n invention, is illustrated. In this example of the method, a 

v * textual document will be used as an example. The process 

o may be applied to audio, video graphic - any file or 

Mi material that can be posted to a web site and converted into 

streaming data and provided with electronic signatures. The 
p process begins with step 400, which illustrates a searcher 

accessing the Internet. The process proceeds to step 402, 
which depicts a search being performed utilizing keywords 
25 that are critical to identifying potential infringing 

material. The search engine may be accessed on the 
searcher's computer or a search engine connected to the 
Internet. The process passes to step 404, which illustrates 
a determination of whether a file contains possible 
3 0 infringing documents. If enough keywords with enough 

frequency are found, a document is considered a potential 
infringing document. If there is no document found on the 
web site that contains enough instances of the keywords, the 
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process passes to step 405. If there are documents found on 
the site that contain enough instances of the keywords, the 
process passes instead to step 406, which depicts the search 
engine providing a list of links to potential infringing 
5 documents . 

The process next proceeds to step 408, which 
illustrates accessing one of the links on the list and 
beginning to process the referenced document. In this 

10 instance, the textual data is set for streaming to random 

access memory on the computer that is searching for the 
material. The process then passes to step 410, which 

O illustrates parsing the textual data into segments. Next, 

the process moves to step 412, which depicts converting the 

M> parsed streaming data into a binary stream. The process 

then proceeds to step 414, which illustrates utilizing the 

m binary data to provide an electronic signature for each 

!'* segment. The process continues to step 416, which depicts 

signatures of the potentially infringing material being sent 

2lJD to a comparator for comparison to electronic signatures of 

[% the original, copyrighted material. 

lJ The process passes to step 418, which illustrates a 

determination of whether all the segment signatures of the 

25 offending material match all the segment signatures of the 

original material. If the determination is made that all 
signatures match, the process proceeds instead to step 420, 
which depicts loading the offending document to a visual 
display and visual inspection taking place. If the 

3 0 determination is made that all segment signatures do not 

match, the process continues to step 422, which illustrates 
a determination of whether any signatures match. If it is 
determined that no signatures match, the process then passes 
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to step 428, which depicts discarding the download and 
moving to another link. If the process determines that 
there are electronic signatures of the offending material 
that match (paragraphs are each assigned a signature, so 
5 that means at least a paragraph is exactly the same) the 

process proceeds from 422 to step 424, which illustrates 
electronically marking the data segments that match. The 
process continues to step 426, which depicts the marked data 
segments being presented for visual inspection and 
10 confirmation . 

Referring to Figure 5, a device for providing signature 
P analysis in accordance with a preferred embodiment of the 

in present invention, is illustrated. A 16-bit linear feedback 

y> shift register that can be implemented in hardware or 

^ software for generating electronic signatures is used to 

[0 compress a bit stream into a 16-bit unique signature. A 

?1SS; signature is generated for each data segment or, in the case 

□ of text documents, paragraphs. The incoming bit stream from 

|Nb the original document is processed and signatures for the 

G data segments are generated. Accuracy is important and a 

O sixteen stage shift register detects a multi-bit error in a 

data stream with a probability of 99.998% and a single bit 
error with a probability of 100%. Signature analysis may be 

2 5 implemented in hardware or software and there are various 

techniques for signature analysis which may be substituted. 

Referring next to Figure 6, a method of generating a 
signature for a data segment in accordance with a preferred 

3 0 embodiment of the present invention, is illustrated. An 

example of a portion of an HTML file is shown below: 
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<P>The Java 2D API enhances the graphics, text and imaging capabilities 
of the AWT, enabling the development of 

richer user interfaces and new types of Java applications. Along 
with these richer graphics, font and image APIs, 
5 the Java 2D API supports enhanced color definition and 

composition, hit detection on arbitrary geometric shapes 
and text, and a uniform rendering model for printers and display 
drivers.</P> 

10 The beginning of a paragraph (data segment) is marked by a 

"<P>" and the end of the paragraph is marked by "</P> n . The 
3 above HTML paragraph is converted to binary form and will be 

l*n seen as an incoming data stream from the Internet. The 

i ; S process begins with step 600, which depicts receiving a data 

US stream from the Internet. The process of detecting 

S : 'f copyright infringement is in progress and signature 

- ;l generation for the incoming data stream is commencing. The 

J**! process proceeds to step 602, which illustrates a 

m determination of whether an ON signal should be sent to the 

2 ; 4) tag counter. As shown above, a <P> marker in HTML indicates 

It the beginning of a paragraph (data segment) and when <P>, 

the paragraph beginning marker, is noted in the data stream, 
the process moves to step 603, which depicts an ON signal 
being sent to the Tag counter. If the determination is made 
2 5 that no beginning marker is detected, the process continues 

from step 602 to step 606, which illustrates a determination 
of whether an end marker, or end of tag, has been detected. 
If the determination is made that an end of tag has not been 
received, the process passes to step 608, which depicts the 
30 signature analyzer processing the data stream. If the 

determination is made that an end of tag bit is received, 
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the process proceeds instead to step 607, which illustrates 
sending an OFF signal to the tag counter. 

The process then continues to step 604, which depicts 
5 the tag counter receiving the end of tag signal. The 

process then passes to step 609, which illustrates a 
determination of whether the tag counter has received an OFF 
signal. If the determination is made that the tag counter 
has not received an off signal, the process passes to step 
10 604, and continues to wait for an end of tag signal. If the 

determination is made the tag counter has received an OFF 
signal, the process passes instead to step 610, which 

0 depicts generating a signature of the data stream between 
i% the beginning and end of tag markers . The process then 

IS proceeds to step 612, which illustrates the system storing 

1 ill , 

the signature for comparison to original, copyright material 
nj electronic signatures. 

n The present invention is faster than existing 

M) techniques and reduces the need for human intervention. 

[% Pages don't have to be saved - data is streamed and matching 

O is done utilizing a one pass pattern matcher. Copyright 

hJ infringement detection is made more deliberate and easier to 

accomplish. The use of signature matching rather than 
25 comparing infringing to original material allows for the 

automatic comparison of many questionable files in a short 
period of time which saves computing power and human 
intervention. Additionally, the detection is accomplished 
without infringing on the copyright of material that may not 
3 0 infringe. The present invention may be implemented in 

hardware as well as software. 

It is important to note that while the present 
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invention has been described in the context of a fully 
functional data processing system and/or network, those 
skilled in the art will appreciate that the mechanism of the 
present invention is capable of being distributed in the 
5 form of a computer usable medium of instructions in a 

variety of forms, and that the present invention applies 
equally regardless of the particular type of signal bearing 
medium used to actually carry out the distribution. 
Examples of computer usable mediums include: nonvolatile, 
10 hard-coded type mediums such as read only memories (ROMs) or 

erasable, electrically programmable read only memories 
(EEPROMs) , recordable type mediums such as floppy disks, 
p hard disk drives and CD-ROMs, and transmission type mediums 

?J such as digital and analog communication links. 

ii 

While the invention has been particularly shown and 
7n described with reference to a preferred embodiment, it will 

H be understood by those skilled in the art that various 

changes in form and detail may be made therein without 

v. sr 

y departing from the spirit and scope of the invention. 



Fort Worth/0116AD-37542/78455 .1 



AUS990891US1 



- 15 - 



CLAIMS : 

What is claimed is: 

1 1. A method for detecting copyright violation, comprising 

2 the following steps: 

3 receiving streamed data segments of suspected copyright 

4 infringing material; 

5 generating an electronic signature for each said data 

6 segment of said suspected material; and 

7 comparing each said signature for each said data 

8 segment of said suspected material to electronic signatures 

9 of original copyright material . 

2. The method of Claim 1, further comprising: 

hi extracting streamed data from the Web onto a server 

^1 connected to the Internet; and 

i ?% 

converting said streamed data to a binary stream. 

v. h£ 

3. The method of Claim 2, further comprising: 

[yB parsing said streamed data into data segments; and 

: L S generating a signature for each said streamed data 

i : 4 segment. 

1 4. The method of Claim 1, further comprising: 

2 . parsing original copyright material into data segments; 

3 and 

4 generating an electronic signature for each said 

5 copyright data segment. 

1 5. The method of Claim 1, further comprising: 

2 performing signature analysis on said streamed data 

3 segments; and 

4 visually examining said data segments matching any said 
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copyright data segments. 

6. The method of Claim 1, further comprising: 

providing key data critical to recognizing said 
copyright material for searching the Internet for infringing 
copyright material . 
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1 7. A system for detecting copyright violation, comprising: 

2 receiving means for receiving streamed data of 

3 suspected copyright infringing material; 

4 signature generation means for generating electronic 

5 signatures of said suspected material; and 

6 comparator means for comparing each said signature for 

7 each said data segment of said suspected material to 

8 electronic signatures of original copyright material. 

1 8, The system of Claim 7, further comprising: 

2 extraction means for extracting streamed data from the 

3 Internet onto a server connected to the Internet; and 

3 binary conversion means for converting said streamed 

data to a binary stream. 

j>l 9. The system of Claim 8, further comprising: 

Jk parsing means for parsing said streamed data into data 

t ; =B segments ; and 

14 means for generating a signature for each said streamed 

i§ data segment. 

ri 10. The system of Claim 7, further comprising: 

i'l parsing means for parsing original copyright material 

3 into data segments; and 

4 means for generating an electronic signature for each 

5 said copyright data segment. 

1 11. The system of Claim 7, further comprising: 

2 comparison means for performing signature analysis on 

3 said streamed data segments; and 

4 means for displaying and examining said suspected data 

5 segments matching any said copyright data segments if 

6 matching signatures exceed a predetermined number. 
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12. The system of Claim 7 , further comprising: 

key data critical to recognizing said copyright 
material for searching the Internet for infringing copyright 
material . 
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1 13 . A system for detecting copyright violation, comprising: 

2 a server connected to the Internet for storing 

3 copyright material; 

4 a search engine available to said server for searching 

5 the Internet; 

6 receiving means on said server for receiving streamed 

7 data of suspected copyright infringing material; 

8 signature generation means for generating electronic 

9 signatures of said suspected material; and 

10 comparator means for comparing each said signature for 

11 each said data segment of said suspected material to 

12 electronic signatures of original copyright material. 

; : | 14. The system of Claim 13, further comprising: 
12 means for loading said copyright material onto said 

server for comparison to infringing copyright material. 

i ; i 15. The system of Claim 13, further comprising: 

J;2 data critical to said copyright material for searching 

lQ> the Internet for infringing copyright material . 

i;i 16. The system of Claim 13, further comprising: 
Cl data streaming means for streaming suspected infringing 

3 data files from locations on the Internet. 

1 17. The system of Claim 13, further comprising: 

2 conversion means for converting said streaming data to 

3 a binary stream. 
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1 18. A computer program product within a computer readable 

2 medium having instructions for detecting copyright 

3 violation, comprising: 

4 instructions within said computer readable medium for 

5 receiving streamed data of suspected copyright infringing 

6 material ; 

7 instructions within said computer readable medium for 

8 generating electronic signatures of said suspected material; 

9 and 

10 instructions within said computer readable medium for 

11 comparing each said signature for each said data segment of 

12 said suspected material to electronic signatures of original 

13 copyright material . 

, ^ 

[i 19. The computer program product of Claim 18, further 

j'§ comprising : 

113 instructions within said computer readable medium for 

i ; 4 extracting streamed data from the Web onto a server 

l§ connected to the Internet; and 

i§ instructions within said computer readable medium for 

converting said streamed data to a binary stream. 

Ill 20. The computer program product of Claim 19, further 

2 comprising : 

3 instructions within said computer readable medium for 

4 parsing said streamed data into data segments; and 

5 instructions within said computer readable medium for 

6 generating a signature for each said streamed data segment. 

1 21. The computer program product of Claim 18, further 

2 comprising: 

3 instructions within said computer readable medium for 

4 parsing original copyright material into data segments; and 
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5 instructions within said computer readable medium for 

6 generating an electronic signature for each said copyright 

7 data segment . 

1 22. The computer program product of Claim 18, further 

2 comprising: 

3 instructions within said computer readable medium for 

4 performing signature analysis on said streamed data 

5 s egment s ; and 

6 instructions within said computer readable medium for 

7 visually examining said data segments matching any said 

8 copyright data segments. 

j ; l 23. The computer program product of Claim 18, further 

IQ, comprising : 

^3 instructions within said computer readable medium for 

r| providing key data critical to recognizing said copyright 

>i material for searching the Internet .for infringing copyright 

r€ material . . 
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DETECTING COPYRIGHT VIOLATION VIA STREAMED EXTRACTION AND 
SIGNATURE ANALYSIS IN A METHOD, SYSTEM AND PROGRAM 
ABSTRACT OF THE DISCLOSURE 



Signature analysis of streamed download data narrows a 
5 list of likely candidates after a search engine search has 

been performed. "Streamed download" means that the data is 
brought byte by byte to the searcher's computer and then 
discarded. During the streaming process the subject data, 
in binary form, is parsed into segments and each segment is 
10 provided with a signature based on the content of the 

segment. The signatures of the individual segments are then 
O compared to signatures of the material that may have been 

infringed. If there are no matches, the streamed download 
LJ data is discarded. If there are enough matches of the 

1:5 signatures, the offending data is displayed for visual 

m inspection. Only caching and display takes place if a 

potential infringing segment is discovered. 
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DOCKET NUMBER: AUS990891US1 
DECLARATION AND POWER OF ATTORNEY FOR 
PATENT APPLICATION 

As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next 
to my name; 

I believe I am an original, first and joint inventor of the subject matter 
which is claimed and for which a patent is sought on the invention entitled 

DETECTING COPYRIGHT VIOLATION VIA STREAMED EXTRACTION AND SIGNATURE ANALYSIS 

IN A METHOD, SYSTEM AND PROGRAM 

the specification of which (check one} 

X is attached hereto. 

was filed on 

as Application Serial No. 

and was amended on 

(if applicable) 

I hereby state that I have reviewed and understand the contents of the above 
identified specification, including the claims, as amended by any amendment 
referred to above . 

I. acknowledge the duty to disclose information which is material to the 
patentability of this application in accordance with Title 37, Code of Federal 
Regulations, §1.56. 

I hereby claim foreign priority benefits under Title 35, United States Code, §119 
of any foreign application (s) for patent or inventor's certificate listed below 
and have also identified below any foreign application for patent or inventor's 
certificate having a filing date before- that of the application on which 
priority is claimed: 

Prior Foreign Application { s ) : Priority Claimed 

Yes No 

(Number) (Country) ( Day /Month/ Year ) 

I hereby claim the benefit under Title 35, United States Code, §120 of any United 
States application (s) listed below and, insofar as the subject matter of each of 
the claims of this application is not disclosed in the prior United States 
application in the manner provided by the first paragraph of Title 35, United 
States Code, §112, I acknowledge the duty to disclose information material to 
the patentability of this application as defined in Title 37, Code of Federal 
Regulations, §1.56 which occurred between the filing date of the prior 
application and the national or PCT international filing date of this 
application : 



(Application Serial #) (Filing Date) (Status) 

I hereby declare that all statements made herein of my own knowledge are true and 
that all statements made on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false 
statements and the like so made are punishable by fine or imprisonment, or both, 
under Section 1001 of Title 18 of the United States Code and that such willful 
false statements may jeopardize the validity of the application or any patent 
issued thereon, 

POWER OF ATTORNEY: As a named inventor, I hereby appoint the following attorneys 
and/or agents to prosecute this application and transact all business in the 
Patent and Trademark Office connected therewith. 
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39,969; James H. Barksdale, Jr. Reg. No. 24,091; Anthony V. England, Reg. No. 
35,129; Leslie A. Van Leeuwen, Reg. No. 42,196; Christopher A. Hughes, Reg. No. 
26,914; Edward A. Pennington, Reg. No. 32,588; John E. Hoel, Reg. No. 26,279; and 
Joseph C. Redmond, Jr., Reg. No. 18,753; Marilyn S. Dawkins, Reg. 31,140; Andrew 
Mitchell Harris, Reg. No. 42,638; Richard N. McCain, Reg. No. 43,785; Andrew J, 
Dillon, Reg. No. 29,634; Max Ciccarelli, Reg. No. 39,454; Jack V. Musgrove, Reg. 
No. 31,986; Daniel E. Venglarik, Reg. No. 39,409; Brian F. Russell, Reg. No. 
40,796; John G. Graham, Reg. No. 19,563; Matthew W. Baca, Reg. No. 42,277; Justin 
M. Dillon, Reg, No. 42,486; Antony P. Ng, Reg. No. 43,427; Steven Lin, Reg. No, 
35,250; Matthew S. Anderson, Reg. No. 39,093; Sidney L. Weatherford, Reg. No. 
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DILLON, LLP, Lakewood on the Park, Suite 350, 7600B North Capital of Texas 
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